Product, building, and infrastructure material stocks dataset for 337 Chinese cities between 1978 and 2020

Reliable city-level product, building, and infrastructure material stocks data are essential for understanding historical material use patterns, benchmarking material efficiency, and informing future recycling potentials. However, such urban material stocks data are often limited, due primarily to unavailable, inconsistent, or noncontinuous city-level statistics. Here, we provided such an Urban Product, Building, and Infrastructure Material Stocks (UPBIMS) dataset for China, a country that has undergone a remarkable urbanization process in the past decades, by collating different official statistics and applying various gap-filling methods. This dataset contains the stock of 24 materials contained in 10 types of products, buildings, and infrastructure in all 337 prefecture-level cities in China from 1978 to 2020. This quality controlled and unified dataset is the first of its kind with such a full coverage of all prefecture-level Chinese cities and can be used in a variety of applications, for example in urban geography, industrial ecology, circular economy, and climate change mitigation. Every piece of data is tagged with its source and the dataset will be periodically updated.


Background & Summary
Urbanization is one of the most important factors in human history that drives the global consumption of natural resources to an ever-high level. For example, the stocks of man-made materials were estimated to already overweigh all life on earth 1 , leading to profound changes in essential life-sustaining functions of the planet Earth. The United Nations Environment Programme (UNEP) warned that further urbanization and demand for urban services such as sheltering and mobility could raise the annual use of resources to nearly 90 billion tonnes by 2050, a 125% increase from 40 billion tonnes in 2010 2 . Sustaining long-term resource use and minimizing consequent environmental impacts accompanied by continuous urbanization requires thus a good understanding of how we have been accumulating materials in products, buildings, and infrastructure (so called in-use stocks) for urban services 3 . The spatiotemporal patterns of such stocks can help reveal the exchange, storage, and transformation of materials between the natural environment and cities and benchmark material efficiency 4 and inform future resource demand, waste management challenges, and urban mining potentials [5][6][7][8] .
As the most populous and world's second-largest economy, China is an ideal case for exploring the historical material stocks development pattern at city level 9 . The enormous rural-urban population migration and growth of cities in China in the past four decades have resulted in a profound expansion of urban built environment stocks 10,11 . For example, between 1990 and 2010, China's share in global material stocks increased from 10% to 22% 12 . This continuously expanding urban resource demand further increases pressure on China's already challenging resource, waste, and climate challenges 13,14 .
Generally, in-use material stocks can be estimated by either a top-down or a bottom-up approach 5 . The top-down method is often employed to evaluate the global and national material stocks by considering the differences between inflows (often available from industrial statistics) and outflows (estimated based on lifetime delay) of materials 15,16 . However, this is often challenging for material stock estimation on the regional or city levels due to the lack of material consumption (inflow) statistics. Instead, the bottom-up method is often used to estimate the urban material stocks by counting each item of products, buildings, and infrastructure and multiplying by their corresponding material intensity 4 . For China for example, a few studies have calculated China's in-use material stocks at the national 17,18 and provincial levels 18 , including with product (e.g., as infrastructure and household durable goods) and material (e.g., 24 types of materials considered in some studies 14,19,20 ) resolution. There are also a few attempts on characterizing materials stocks for large cities (e.g., Beijing 21,22 , Chongqing 23 , and Xiamen 24 ) in China; yet the overall city coverage is very low and medium and small cities are usually excluded. This relates mainly to data gaps on the city level caused by, e.g., inconsistent or noncontinuous local statistics and change of administrative areas. Considering there are 337 prefecture-level cities in China and they are at varying urbanization stages and have distinct developmental pathways 25 , it would be important to develop a complete dataset that covers all cities and years to reveal the spatiotemporal patterns of urban product, building, and infrastructure material stocks 20 .
In this study, we aim to provide such an Urban Product, Building, and Infrastructure Material Stocks (UPBIMS) dataset for China that have been compiled by collating urban material stock related statistical data from various yearbooks, bulletins, and agencies. Moreover, we have filled gaps for missing data (about 55% of all the data records) based on rational principles and consistent assumptions. Eventually, based on a bottom-up stock accounting method, we present a dataset on total weight and 24 types of material contained in 10 subtypes of products, buildings, and infrastructure (which are further categorized into residential buildings, nonresidential buildings, roads, urban rails, pipes, other infrastructure, vehicles, agricultural machinery, industrial machinery, and appliances) in active use in 337 cities of mainland China from 1978 to 2020.
The rest of the paper summarizes the accounting scopes, data sources, and gap filling approaches we used and the quality of this dataset. We will maintain and periodically update this dataset in the future. This comprehensive and consistent dataset of material stocks of all Chinese prefecture-level cities can help understand spatiotemporal patterns of urban weight growth, inform future materials demand associated with continuous urbanization, optimize construction and demolition waste management, facilitate discussion on embodied emissions of construction, and thus support the circular and low-carbon transition of cities in China and beyond.

Methods
Spatial and temporal boundary. Based on China's administrative divisions, our dataset covered all 337 major cities in China, including four municipalities directly under the Central Government (i.e., Beijing, Shanghai, Tianjin, and Chongqing) and 333 prefecture-level cities belonging to 23 provinces and 5 autonomous regions. Their codes and changes of names in the past 43 years are available in sheet 'Code' in file 'Data source. xlsx' on Figshare 26 . Our spatial accounting scope is municipal districts and townships 23 . All these 337 cities host over 64% of China's population and cover above 75% of built-up area in China. The time horizon of our dataset is from 1978 (when China started its reform and opening-up and thus cities started to grow) to 2020 (when the latest data are accessible). All computations were performed in time-discrete steps of one year.
Material stocks of products, buildings, infrastructure. We used a bottom-up stock accounting method 15,27 and counted all pieces of products, buildings, and infrastructure in cities over time to determine their embodied material stocks. The scope of urban product, building, and infrastructure stock items and corresponding estimation methods and data sources for their embodied material stocks are summarized in sheet 'Scope' in file 'Data source.xlsx' on Figshare 26 and described in detail in the following sections.
We have attempted to include all products, buildings, and infrastructure in these cities. This adds up to 10 types of products, buildings, and infrastructure (details in sheet 'Scope' in file 'Data source.xlsx' on Figshare 26 ). Based on significance of embodied materials and available material intensity data (detailed in file 'Data source.xlsx' and 'Material intensity.xlsx' on Figshare 26 ), we considered 24 types of materials, including 13 types of base materials (steel, copper, aluminum, timber, brick, gravel, sand, asphalt, lime, glass, cement, plastic, and rubber), 3 types of precious metals (gold, silver, and palladium), 4 types of rare metals (indium, neodymium, yttrium, and europium), and 4 types of other metals (lead, zinc, magnesium, and cobalt).
Data sources. The data on urban product, building, and infrastructure stock items are from official statistics or otherwise estimations based upon rational principles and consistent assumptions. A total of 1259 official statistical bulletins or yearbooks were compiled and used to derive data for population and products, buildings, and infrastructure (details in sheet 'References' in file 'Data source.xlsx' on Figshare 26 ). These original statistical data, however, only directly report 45% of all possible stock data across city, time, and type of stock. In the following sections, we have described different sources and responsible departments of such statistics and how we have filled the data gaps systematically (details in sheet 'Statistics' in file 'Data source.xlsx' on Figshare 26 ).
Household survey data for buildings and appliances. We obtained the stock of urban residential buildings and home appliances by multiplying the per household (or per capita) stock and the numbers of family (or population). This is because per capita residential floor area (PC-RA) and the amount of home appliances per 100 urban households (PH-UA) are covered by the household survey of the National Bureau of Statistics of China (NBSC).
www.nature.com/scientificdata www.nature.com/scientificdata/ Such household survey was based upon a stratified multistage random sampling method. For example, in 2021, the NBSC selected about 160,000 urban households from 1,800 counties/districts across China. At the end of each quarter, the survey results are aggregated upward from the county level to the national level and published by the NBSC. This data summary method means that the PC-RA and PH-UA data are quarterly and county/district based. It should be noted that, after China introduced the urban-rural integration plan in 2013, the sampling scope accordingly expanded from municipal districts to municipal districts and townships in the urban-rural fringe. Nevertheless, the sample size of townships is much smaller than that of municipal districts, so that sampling scope change affected only 24% of the cities (with a decrease or increase of stock by more than 10%). Therefore, we have disregarded this scope change for the sake of consistency in statistical caliber.
These aggregated national and provincial data are annually published in the China Statistical Yearbook, China Household Survey Statistical Yearbook, and the National Statistical Data Release Database. However, it is often difficult to get the data for successive years at the municipal or city level, due to lack of strict rules on the frequency of data release. The local governments and statistical departments publish survey results only for selected years in their statistical bulletins and yearbooks.
It should be noted that there are no official data on nonresidential building floor areas in China. A few previous studies estimated the stock of nonresidential buildings simply the same as that of residential buildings 15,22,23 . We have improved this estimation by approximating the ratios of residential to nonresidential building floor areas of these cities based on their corresponding, and largely available, provincial ratios.
Due to data gaps and our focus on urban areas, the building stocks in rural areas within the urban municipal administrative boundaries are excluded from this analysis. The building stocks in rural areas did not experience a dramatic change comparing to urban building stocks. Indeed, it has grown less than 18% from 2000 to 2015 28 , and such a growth rate is further declining 29 with the continuous urbanization.
Ministerial statistics for infrastructure, machinery, and vehicles. Infrastructure, machinery, and vehicle stock data are collected and maintained by respective ministries in China, although not always with a consistent basis. For example, the stock of all vehicles (including passenger vehicles, trucks, and motorcycles) is tracked by the Ministry of Public Security (MPS) and their local branches. So, the vehicle stock data at the city/ municipality levels are published by the local statistical bureau, while the NBSC publishes national and provincial aggregated data in the China Statistical Yearbook. We were able to gather data on vehicle stocks from the local statistical yearbooks and local statistical bulletins of all 337 cities.
Similarly, the Ministry of Housing and Urban-Rural Development (MOHURD) is in charge of the infrastructure data in China. The MOHURD publishes the end-of-year infrastructure stock data in the China Urban-Rural Construction Statistical Yearbook and China Urban Construction Statistical Yearbook at the city level since 2002, which includes data on roads, subways, pipelines, and street lamps. The data for previous years  can be obtained from the statistical bulletins and yearbooks published by local governments and the local statistical bureau.
Ministry of Agriculture and Rural Affairs (MARA) takes care of the administration and compilation of agricultural machinery data. The stock (installed capacity) of agricultural machinery can be obtained from the local statistical yearbooks and local statistical bulletins of all 337 cities. The stock of industrial machinery is unfortunately not available in any statistics. Instead, we approximated their installed capacity based on the stock of agricultural machinery and the total power consumption by agricultural and industrial machinery. This method has been applied before in quite a few similar studies due to the data unavailability of industrial machinery 23,24,[30][31][32] . And most of them indicated high uncertainties but low sensitivity for this method 23 .
Population and urbanization. The data on population and the number of families in the cities are key to our results, particularly for scaling up per capita residential floor areas and per 100 households appliances, and they were obtained from population surveys initiated by the NBSC and local governments. To estimate material stocks in municipal districts and townships, we chose to use the urban permanent population which refers to the actual population living there for over six months, as opposed to household registry population. This is also consistent with the sampling scope of household survey of the NBSC (source for our residential building and appliances data).
The permanent population data are from three types of population surveys in China. The first is the country-wide census, which is conducted every ten years. The data for those years ending as zero come from this census which can be obtained from the NBSC. The second is the sample survey of 1% of the national population, which is conducted every five years. The data for those years ending as five comes from this survey which can be obtained from the provincial bureau of statistics. The third is the national population change survey which is conducted every year except for the years when the first two types were conducted. The sample size accounts for about 1‰ of the total population. However, these permanent population data are not published consistently across cities. We filled in the missing data based on trend extrapolation between total urban permanent population and urbanization rate of cities. This is deemed acceptable because overall the missing share of total population in all 337 cities is only 11% (details in sheet 'Statistics' in file 'Data source.xlsx' on Figshare 26 ). The number of families in the 337 cities was estimated based upon per household population from household survey.
Data gap filling and estimation. Significant amounts of missing values can be observed in the initially compiled database. This relates for example to either discontinuity in the data released by prefecture-level governments and statistical agencies, or the change of scope of urban administrative regions over time (see Fig. 1a). We have documented the original data source of each item used in the formula and calculated the missing ratio of all items in stock calculation functions in 'Data sources.xlsx' on Figshare 26 . Depending on the patterns and www.nature.com/scientificdata www.nature.com/scientificdata/ proportions of missing values of these items across 43 years (from 1978 to 2020) in 337 cities, three different methods were used either individually or in combination (see Fig. 1b) to fill the gaps: • Method I estimation is based partially on linear interpolation. We used this method to fill data gaps for those series with a missing rate below 10% or missing data between two years. This 10% is set based on the statistical principle 33 that if the missing rate is more significant than 10%, the result of subsequent statistical analyses may be biased. • Method II estimation is based partially on the provincial stock growth rate. We used this method to fill gaps for those series with a missing rate above 10%. This is particularly the case for continuously missing values before the starting year or after the ending year with values in the series. The provincial stock growth rate, which was derived from available statistics on the provincial level, was applied to the city level to estimate these missing values. • Method III estimation is based entirely on proxy. We used this method to fill gaps where there are no citylevel data mainly for three types of issues. The first refers to missing household survey values for some cities in recent years; and we assumed them the same as the provincial averages. The second type refers to missing absolute volumes for some cities; and we approximated them by multiplying provincial data with these cities' share of GDP in their provinces. The third type relates to the conversion ratio assumptions among products, as we elaborated above for nonresidential buildings (based on residential buildings and land use) and industrial machinery (based on agricultural machinery).
In the end, we have successfully filled all the data gaps that initially account for approximately 55% of all data points. Although 17% of the data were populated using Method III (proxy) alone, more than 50% out of these proxy data were based actually on household surveys (details in sheet 'Statistics' in file 'Data source.xlsx' on Figshare 26 ). This ratio means that most of the missing data is filled in on the basis of some sort of official statistics. Hence, our eventually imputed dataset is deemed as reasonably accurate to reveal the overall spatiotemporal patterns of product, building, and infrastructure material stocks among cities.

Technical Validation
Data overview. The material stocks in Chinese cities have steadily increased during 1978-2020, especially in the early 21st century, when the growth rate peaked (Fig. 2a). Material wise, non-metallic materials used in buildings and infrastructure such as sand (39%-37%) and gravel (41%-40%) take up the largest portion among all materials over time due to their bulky nature and large volume of use in construction (Fig. 2b). The growth rate of these materials (Fig. 2c), however, has slowed since 2015 when China's infrastructure boom started to level off (see the decreased proportion of roads and non-residential buildings from 1978 to 2020 in Fig. 2h). In contrast, the growth of metal stocks has shown a steadily increasing trend (Fig. 2d-f) with the increasing ownership of buildings, appliances, and vehicles in urban households (see an overview in Fig. 2h and more details on Figshare 26 ). Regionally, China' urban material stock shows an uneven geographical distribution between cities in the southeast and in the northwest, which is in accordance with China's uneven pattern of urbanization, as indicated by the Hu Line (that is, the "Aihui-Tengchong" line which is a famous dividing line of China's climate, population, economic, and social patterns between the southeast and northwest, see more details in Fig. 2g and sheet 'Code' in file 'Data source.xlsx' on Figshare 26 ) 48,49 . Cities in eastern China have been always among the top across the country in terms of urban material stocks from 1978 to 2020, and the difference between eastern and western cities has widened (Fig. 2h). The total material stocks and per-capita material stocks of all 337 cities have been growing in the past 40 years. Although the difference among total material stocks of the 337 cities is tremendous (Fig. 2i), per capita material stocks of different cities appears comparably more balanced (Fig. 2j). The growth rate of per capita stocks in small cities with low stock is equal to or even higher than that in large cities with high stock. For example, Beijing's total urban material stock growth between 1978 and 2020 was 176 times that of Ngari. Per-capita stocks in Ngari is 297 t in 2020, which is twice of Beijing's 123 t (see Fig. 3g,f for more details in 'Material stock and population 1978-2020.xlsx' on Figshare 26 ).
It is noteworthy that the spatial distribution of per capita urban material stocks is different from that of the total stock. Although approximately 95% of urban total material stocks in 2020 were accumulated on the east of the Hu line 49 , many cities in the west have actually higher per capita stocks. This contrast reflects the varying composition of in-use products. For example, the top 10 cities with the highest proportion of infrastructure stocks are distributed on the sparsely populated west of the Hu line, and have very high per capita stocks; while in contrast, most densely populated cities in the east have a higher share of stock in buildings (see Fig. 3 for an overview and more details on Figshare 26 ). This also indicates the scale of economy and higher efficiency of materials use for service provision in denser and more developed cities.
Comparison with existing estimates. When our stock estimates are compared with literature values (Table 1), the material stocks results are overall close to studies based on similar research methods. We want to highlight below two major differences between our scope and data sources and that of previous studies.
www.nature.com/scientificdata www.nature.com/scientificdata/ First, the total stock of some materials (e.g., gold 14 , steel 34 , and plastic 35 ) appears moderately different from that of previous studies on the country level, due mainly to scope differences. Our spatial scope is limited to the urban area, and we have not considered the rural buildings and rural appliances. Meanwhile, we have included fewer items of www.nature.com/scientificdata www.nature.com/scientificdata/ products, buildings, and infrastructure in stock accounting, as our much wider city coverage does not allow for data collection for all cities, e.g., railways, water and environmental infrastructures, and appliances for commercial use.
Second, we explicitly considered the household survey and population statistical scope when estimating the building and appliance stocks, and used the permanent population (in line with the sampling scope of the urban household survey) to estimate the total floor area and household appliances instead of the household registry population. Therefore, residential building stocks in our dataset are generally higher than in previous bottom-up studies (see the examples of Beijing 39 and Shanghai 50 ).
Overall, despite some of our research scope limitations, our quality controlled and unified dataset is deemed reliable and is actually the first of its kind with such a full coverage of all prefecture-level Chinese cities, which can thus be further used in a variety of applications, for example in urban geography, circular economy, and climate change mitigation research.  www.nature.com/scientificdata www.nature.com/scientificdata/ Limitations and uncertainties. Despite our best efforts, our datasets bear some inevitable uncertainties and limitations, which should be pointed out as below and remain to be addressed in the future to improve the accuracy.
• First, due to data paucity, 55% of the data is estimated by data gap-filling methods. Although the assumptions for gap filling were based on reliable official data and justified principles, this scale of data filling would unavoidably result in systemic errors in the imputed database. This can be better validated and cross-checked by more independent estimation based on other methods in the future. • Second, the material intensities derived from literature come with high uncertainty, due to unfortunately very little information on this regard, not to mention region specific differences. For example, the material intensity may vary by time (while we only considered the material intensity changes for a few products such as TVs, computers, and buildings) and by size and brands of products (which we didn't consider at all). Although this is a common limitation in most previous bottom-up material stock studies 32 , it would be beneficial to establish more community-wide and reliable material inventory databases for key products and infrastructure in future research, as exemplified for buildings 19,20 . • Third, the various sources of statistics from different yearbooks, bulletins, and agencies bear uncertainty as well. For example, the household survey data used in the calculation are obtained from the NBSC survey on 1% of the population and does not fully cover all urban residents, and the population statistical scope changes over time as well as we elaborated in the Method section. This uncertainty may lead to shrinking building stocks in some cities as their population decreases (e.g., cities in northeast China like Siping). www.nature.com/scientificdata www.nature.com/scientificdata/ • Fourth, our research scope is limited to urban areas, while stocks of some products such as agricultural machinery and industrial machinery could not be split between urban and rural explicitly; and stock of industrial machinery is approximated by agricultural machinery, thus with high uncertainty. Nevertheless, we can confirm that agricultural and industrial machinery together accounted for only a small proportion (below 0.4% in 2020) of the total material stocks, so this would not affect our overall results that much.
To assess the sensitivity and compare the impact of each parameter (i.e., product quantity and material intensity), we conducted a one-at-a-time sensitivity analysis (e.g., by increasing one parameter by 10% while keeping others unchanged) to examine the effect of variation of parameters separately 51 . Figure 4 shows the sensitivity analysis results for the average stocks of the 337 cities in the year 2020, and results across all 43 years and other details are available in sheet 'Sensitivity' in file 'Data source.xlsx' on Figshare 26 . We can see that among product quantities, building floor areas have the largest impact on the total material stocks, while for material intensities, most materials have a negligible impact on total material stock (less than 1%), except for sand and gravel (above 3%).
All these uncertainties and limitations should be paid special attention when this dataset is used and interpreted. We also aim to address them in the future, e.g., by expanding the product and materials selection and improving the material intensity estimation. This data structure could be extended to even smaller cities below the prefecture-level (i.e., county level) as well. Since every piece of data is tagged with its source in the dataset, we plan to periodically update this dataset.

Usage Notes
The data files are documented as xlsx files, which can be readily read and processed by many software, such as Matlab, R, and Python. The 24 materials can be segmented into base materials, precious metals, rare metals, and other metals. Material intensity in 'Material intensity 1978-2020.xlsx' on Figshare 26 detailed records of their accounting scope. Meanwhile, data sources are provided for each data record and our assumptions for estimated data are explicitly explained, so our estimates can be changed or excluded whenever needed. In addition, the data filling code provided in this study can also be used to fill other types of statistical data, depending on the applicable principles.

Code availability
The data gap were filled in Python 3.8, using the following libraries: pandas 1.